automation and human effort
Thread by @AnthropicAI on Thread Reader App – Thread Reader App
Anthropic Dec 19 • 11 tweets • 5 min read Bookmark Save as PDF My Authors It's hard work to make evaluations for language models (LMs). We've developed an automated way to generate evaluations with LMs, significantly reducing the effort involved. We test LMs using 150 LM-written evaluations, uncovering novel LM behaviors. In the simplest case, we generated thousands of yes-no questions for diverse behaviors just by instructing an LM (and filtering out bad examples with another LM). In the simplest case, we generated thousands of yes-no questions for diverse behaviors just by instructing an LM (and filtering out bad examples with another LM).